flowchart TD
A[/"Candidate Subgroup m"/] --> B["<b>STAGE 1: SCREENING</b><br/>Run n.splits.screen splits<br/>(default: 30)"]
B --> C{{"Consistency <<br/>screen.threshold?"}}
C -->|Yes| D["❌ <b>FAIL</b><br/>Candidate eliminated<br/>(clearly non-viable)"]
C -->|No| E["<b>STAGE 2: SEQUENTIAL EVALUATION</b><br/>Initialize with Stage 1 results"]
E --> F["Run batch of splits<br/>(batch.size = 20)"]
F --> G["Compute Wilson CI<br/>for consistency"]
G --> H{{"CI_lower ≥<br/>threshold?"}}
H -->|Yes| I["✅ <b>PASS</b><br/>Early stop<br/>(95% confident above)"]
H -->|No| J{{"CI_upper <<br/>threshold?"}}
J -->|Yes| K["❌ <b>FAIL</b><br/>Early stop<br/>(95% confident below)"]
J -->|No| L{{"Reached<br/>n.splits.max?"}}
L -->|No| F
L -->|Yes| M{{"Final consistency ≥<br/>threshold?"}}
M -->|Yes| N["✅ <b>PASS</b>"]
M -->|No| O["❌ <b>FAIL</b>"]
style A fill:#e1f5fe
style B fill:#fff3e0
style E fill:#fff3e0
style D fill:#ffcdd2
style K fill:#ffcdd2
style O fill:#ffcdd2
style I fill:#c8e6c9
style N fill:#c8e6c9
Two-Stage Consistency Algorithm
Algorithm Flowchart
The following diagram illustrates the two-stage sequential consistency algorithm used when use_twostage = TRUE in forestsearch().
Parameter Summary
flowchart LR
subgraph Stage1["<b>Stage 1: Screening</b>"]
P1["n.splits.screen<br/><i>default: 30</i>"]
P2["screen.threshold<br/><i>auto: ~pcons - 2.5 SE</i>"]
P3["min.valid.screen<br/><i>default: 10</i>"]
end
subgraph Stage2["<b>Stage 2: Sequential</b>"]
P4["batch.size<br/><i>default: 20</i>"]
P5["conf.level<br/><i>default: 0.95</i>"]
P6["n.splits.max<br/><i>= fs.splits</i>"]
end
Stage1 --> Stage2
style Stage1 fill:#fff3e0
style Stage2 fill:#e3f2fd
Early Stopping Logic
The Wilson score confidence interval provides the basis for early stopping decisions:
flowchart TD
subgraph CI["Wilson Score CI at conf.level = 0.95"]
A["Current: n_success / n_total"]
A --> B["Compute 95% CI<br/>[lower, upper]"]
end
B --> C{{"lower ≥ threshold"}}
C -->|Yes| D["<b>PASS</b><br/>95% confident<br/>consistency ≥ threshold"]
C -->|No| E{{"upper < threshold"}}
E -->|Yes| F["<b>FAIL</b><br/>95% confident<br/>consistency < threshold"]
E -->|No| G["<b>CONTINUE</b><br/>Need more data"]
style D fill:#c8e6c9
style F fill:#ffcdd2
style G fill:#fff9c4
Comparison: Fixed vs Two-Stage
flowchart LR
subgraph Fixed["<b>Fixed-Sample</b><br/>(use_twostage = FALSE)"]
F1["Run exactly<br/>fs.splits splits"] --> F2["Compute<br/>consistency"] --> F3["Pass/Fail<br/>decision"]
end
subgraph TwoStage["<b>Two-Stage</b><br/>(use_twostage = TRUE)"]
T1["Stage 1<br/>Screen"] --> T2["Stage 2<br/>Sequential"] --> T3["Early stop<br/>or max splits"]
end
Fixed -.->|"Predictable runtime<br/>Exact reproducibility"| Use1["Regulatory<br/>submissions"]
TwoStage -.->|"3-10x faster<br/>Adaptive"| Use2["Exploratory<br/>analysis"]
style Fixed fill:#e8eaf6
style TwoStage fill:#e8f5e9
Code Example
# Two-stage with custom parameters
result <- forestsearch(
df.analysis = trial_data,
hr.threshold = 1.25,
pconsistency.threshold = 0.90,
fs.splits = 500,
use_twostage = TRUE,
twostage_args = list(
n.splits.screen = 40,
batch.size = 25,
conf.level = 0.95
),
details = TRUE
)
# Check which algorithm was used
result$grp.consistency$algorithm
#> [1] "twostage"When Two-Stage Provides Maximum Benefit
| Scenario | Expected Speedup |
|---|---|
| Many candidates clearly fail at Stage 1 | 5-10x |
| True consistency well above threshold | 3-5x |
| True consistency well below threshold | 3-5x |
Large fs.splits (>200) |
Higher benefit |
| Most candidates near threshold | Minimal |